Semantic Video Segmentation by Gated Recurrent Flow Propagation
نویسندگان
چکیده
Semantic video segmentation is challenging due to the sheer amount of data that needs to be processed and labeled in order to construct accurate models. In this paper we present a deep, end-to-end trainable methodology to video segmentation that is capable of leveraging information present in unlabeled data in order to improve semantic estimates. Our model combines a convolutional architecture and a spatio-temporal transformer recurrent layer that are able to temporally propagate labeling information by means of optical flow, adaptively gated based on its locally estimated uncertainty. The flow, the recognition and the gated temporal propagation modules can be trained jointly, end-to-end. The temporal, gated recurrent flow propagation component of our model can be plugged into any static semantic segmentation architecture and turn it into a weakly supervised video processing one. Our extensive experiments in the challenging CityScapes and Camvid datasets, and based on multiple deep architectures, indicate that the resulting model can leverage unlabeled temporal frames, next to a labeled one, in order to improve both the video segmentation accuracy and the consistency of its temporal labeling, at no additional annotation cost and with little extra computation.
منابع مشابه
Supplementary Material for Video Propagation Networks
In this supplementary, we present experiment protocols and additional qualitative results for experiments on video object segmentation, semantic video segmentation and video color propagation. Table 1 shows the feature scales and other parameters used in different experiments. Figures 1, 2 show some qualitative results on video object segmentation with some failure cases in Fig. 3. Figure 4 sho...
متن کاملFast Semantic Segmentation on Video Using Motion Vector-Based Feature Interpolation
Models optimized for accuracy on challenging, dense prediction tasks such as semantic segmentation entail significant inference costs, and are prohibitively slow to run on each frame in a video. Since nearby video frames are spatially similar, however, there is substantial opportunity to reuse computation. Existing work has explored basic feature reuse and feature warping based on optical flow,...
متن کاملImproving Semantic Video Segmentation by Dynamic Scene Integration
Multi-class image segmentation and pixel-level labeling of the frames that make up a video could be made more efficient by incorporating temporal information. Recently, Convolutional Neural Networks (ConvNets) have made an impressive positive impact on the single image segmentation problem. In this paper, in order to further increase labeling accuracy, we propose a method for integrating short-...
متن کاملSTFCN: Spatio-Temporal FCN for Semantic Video Segmentation
This paper presents a novel method to involve both spatial and temporal features for semantic segmentation of street scenes. Current work on convolutional neural networks (CNNs) has shown that CNNs provide advanced spatial features supporting a very good performance of solutions for the semantic segmentation task. We investigate how involving temporal features also has a good effect on segmenti...
متن کاملHierarchical Feature For Scene Parsing Using Fully Recurrent Network
In scene parsing, the wide-range contextual information is not effectively encoded. Scene parsing provides segmentation and determines an scene into different regions associated with semantic categories. The main objective of scene parsing is to reduce semantic gap between humans and computer machines on scene understanding. The scenes parsing applications are object detection, text detection o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1612.08871 شماره
صفحات -
تاریخ انتشار 2016